Overview

Dataset statistics

Number of variables16
Number of observations3728369
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory517.2 MiB
Average record size in memory145.5 B

Variable types

NUM11
CAT3
DATE2

Reproduction

Analysis started2020-05-07 10:37:57.744678
Analysis finished2020-05-07 12:12:00.127255
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
VIN has a high cardinality: 3139779 distinct values High cardinality
ZavC is highly skewed (γ1 = 21.48757874) Skewed
Km has 286977 (7.7%) zeros Zeros
ZavA has 1685149 (45.2%) zeros Zeros
ZavB has 3481154 (93.4%) zeros Zeros
ZavC has 3705130 (99.4%) zeros Zeros

Variables

STK
Real number (ℝ≥0)

Distinct count552
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3525.71763
Minimum3100
Maximum9999
Zeros0
Zeros (%)0.0%
Memory size28.4 MiB

Quantile statistics

Minimum3100
5-th percentile3114
Q13245
median3524
Q33739
95-th percentile3842
Maximum9999
Range6899
Interquartile range (IQR)494

Descriptive statistics

Standard deviation338.8045411
Coefficient of variation (CV)0.0960952001
Kurtosis58.10594476
Mean3525.71763
Median Absolute Deviation (MAD)229.9482183
Skewness5.076363917
Sum1.314517631e+10
Variance114788.5171
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3100. 3101. 3102.5 3103.5 3104.5 ... 8868. 8870.5 8886. 9998.5 9999. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3413 73558 2.0%
 
3307 50561 1.4%
 
3851 35250 0.9%
 
3112 33364 0.9%
 
3766 31486 0.8%
 
3114 31286 0.8%
 
3243 27837 0.7%
 
3523 27576 0.7%
 
3609 27574 0.7%
 
3754 26819 0.7%
 
Other values (542) 3363058 90.2%
 
ValueCountFrequency (%) 
3100 18694 0.5%
 
3102 13076 0.4%
 
3103 2588 0.1%
 
3104 19890 0.5%
 
3105 16225 0.4%
 
ValueCountFrequency (%) 
9999 26 < 0.1%
 
9998 1 < 0.1%
 
9911 3 < 0.1%
 
8901 3 < 0.1%
 
8871 29 < 0.1%
 

DrTP
Real number (ℝ≥0)

Distinct count14
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.036513285031605
Minimum0
Maximum13
Zeros4318
Zeros (%)0.1%
Memory size3.6 MiB

Quantile statistics

Minimum0
5-th percentile2
Q15
median13
Q313
95-th percentile13
Maximum13
Range13
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.682111564
Coefficient of variation (CV)0.4665077832
Kurtosis-0.8632876642
Mean10.03651329
Median Absolute Deviation (MAD)4.153595745
Skewness-1.032551861
Sum37419825
Variance21.9221687
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1.5 2.5 3.5 4.5 ... 9.5 10.5 11.5 12.5 13. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
13 2471966 66.3%
 
2 820573 22.0%
 
12 211330 5.7%
 
5 186819 5.0%
 
3 17993 0.5%
 
7 5309 0.1%
 
0 4318 0.1%
 
6 3886 0.1%
 
11 2581 0.1%
 
9 1666 < 0.1%
 
Other values (4) 1928 0.1%
 
ValueCountFrequency (%) 
0 4318 0.1%
 
1 309 < 0.1%
 
2 820573 22.0%
 
3 17993 0.5%
 
4 145 < 0.1%
 
ValueCountFrequency (%) 
13 2471966 66.3%
 
12 211330 5.7%
 
11 2581 0.1%
 
10 1271 < 0.1%
 
9 1666 < 0.1%
 

VIN
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count3139779
Unique (%)84.2%
Missing0
Missing (%)0.0%
Memory size28.4 MiB
-
 
32
005
 
27
001
 
26
003
 
25
008
 
24
Other values (3139774)
3728235
ValueCountFrequency (%) 
- 32 < 0.1%
 
005 27 < 0.1%
 
001 26 < 0.1%
 
003 25 < 0.1%
 
008 24 < 0.1%
 
004 22 < 0.1%
 
033 22 < 0.1%
 
106 22 < 0.1%
 
TEST0000000000001 21 < 0.1%
 
124 21 < 0.1%
 
Other values (3139769) 3728127 > 99.9%
 

Length

Max length22
Mean length16.466887
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 26 54.2%
 
Decimal_Number 10 20.8%
 
Other_Punctuation 6 12.5%
 
Close_Punctuation 1 2.1%
 
Dash_Punctuation 1 2.1%
 
Open_Punctuation 1 2.1%
 
Connector_Punctuation 1 2.1%
 
Space_Separator 1 2.1%
 
Math_Symbol 1 2.1%
 
ValueCountFrequency (%) 
Latin 26 54.2%
 
Common 22 45.8%
 
ValueCountFrequency (%) 
ASCII 48 100.0%
 
Distinct count3718825
Unique (%)99.7%
Missing0
Missing (%)0.0%
Memory size28.4 MiB
Minimum2018-01-02 05:38:33.517000
Maximum2018-12-31 16:07:02.557000
Histogram

TypMot
Real number (ℝ)

Distinct count62848
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28893.53555
Minimum-1
Maximum62846
Zeros1
Zeros (%)< 0.1%
Memory size14.2 MiB

Quantile statistics

Minimum-1
5-th percentile24
Q120775
median25792
Q340400
95-th percentile58210
Maximum62846
Range62847
Interquartile range (IQR)19625

Descriptive statistics

Standard deviation16330.73876
Coefficient of variation (CV)0.5652038925
Kurtosis-0.5567911312
Mean28893.53555
Median Absolute Deviation (MAD)13003.13754
Skewness0.1704521434
Sum1.077257623e+11
Variance266693028.5
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1.00000e+00 -5.00000e-01 7.50000e+00 8.50000e+00 9.50000e+00 ... 6.27635e+04 6.27645e+04 6.27675e+04 6.27745e+04 6.28460e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-1 175087 4.7%
 
24 91435 2.5%
 
25756 36294 1.0%
 
22192 33405 0.9%
 
17805 31502 0.8%
 
26725 25512 0.7%
 
22621 25131 0.7%
 
40245 23539 0.6%
 
23162 23288 0.6%
 
21751 23023 0.6%
 
Other values (62838) 3240153 86.9%
 
ValueCountFrequency (%) 
-1 175087 4.7%
 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
ValueCountFrequency (%) 
62846 2 < 0.1%
 
62845 1 < 0.1%
 
62844 1 < 0.1%
 
62843 1 < 0.1%
 
62842 1 < 0.1%
 

TZn
Real number (ℝ)

Distinct count6266
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3755.8111487891892
Minimum-1
Maximum6264
Zeros4
Zeros (%)< 0.1%
Memory size7.1 MiB

Quantile statistics

Minimum-1
5-th percentile652
Q12238
median4515
Q34930
95-th percentile5846
Maximum6264
Range6265
Interquartile range (IQR)2692

Descriptive statistics

Standard deviation1697.19653
Coefficient of variation (CV)0.4518854817
Kurtosis-0.9518556151
Mean3755.811149
Median Absolute Deviation (MAD)1476.40286
Skewness-0.5958684209
Sum1.400304986e+10
Variance2880476.062
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1.0000e+00 -5.0000e-01 1.2500e+01 1.3500e+01 2.9500e+01 ... 6.2395e+03 6.2405e+03 6.2415e+03 6.2425e+03 6.2640e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4930 898984 24.1%
 
1771 259757 7.0%
 
4515 192249 5.2%
 
4173 180757 4.8%
 
5811 166904 4.5%
 
5846 164297 4.4%
 
1033 130150 3.5%
 
4008 113780 3.1%
 
3494 105270 2.8%
 
2338 94842 2.5%
 
Other values (6256) 1421379 38.1%
 
ValueCountFrequency (%) 
-1 808 < 0.1%
 
0 4 < 0.1%
 
1 5 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
ValueCountFrequency (%) 
6264 1 < 0.1%
 
6263 12 < 0.1%
 
6262 4 < 0.1%
 
6261 3 < 0.1%
 
6260 2 < 0.1%
 

DrVoz
Real number (ℝ)

Distinct count45
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.84187214302018
Minimum-1
Maximum43
Zeros19450
Zeros (%)0.5%
Memory size3.6 MiB

Quantile statistics

Minimum-1
5-th percentile7
Q120
median20
Q320
95-th percentile20
Maximum43
Range44
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5.490181858
Coefficient of variation (CV)0.3077133282
Kurtosis1.806544911
Mean17.84187214
Median Absolute Deviation (MAD)4.117401271
Skewness-0.3956745787
Sum66521083
Variance30.14209684
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1. -0.5 0.5 1.5 2.5 ... 39.5 40.5 41.5 42.5 43. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
20 2705196 72.6%
 
9 439599 11.8%
 
7 200821 5.4%
 
13 163866 4.4%
 
11 46953 1.3%
 
26 24553 0.7%
 
29 21431 0.6%
 
35 19577 0.5%
 
0 19450 0.5%
 
32 18296 0.5%
 
Other values (35) 68627 1.8%
 
ValueCountFrequency (%) 
-1 7 < 0.1%
 
0 19450 0.5%
 
1 93 < 0.1%
 
2 4808 0.1%
 
3 1 < 0.1%
 
ValueCountFrequency (%) 
43 46 < 0.1%
 
42 4 < 0.1%
 
41 10064 0.3%
 
40 182 < 0.1%
 
39 3917 0.1%
 

ObchOznTyp
Real number (ℝ)

Distinct count67142
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34759.20922
Minimum-1
Maximum67140
Zeros1
Zeros (%)< 0.1%
Memory size14.2 MiB

Quantile statistics

Minimum-1
5-th percentile5324
Q122436
median33615
Q346406
95-th percentile63170
Maximum67140
Range67141
Interquartile range (IQR)23970

Descriptive statistics

Standard deviation17213.43888
Coefficient of variation (CV)0.4952195193
Kurtosis-0.8557945261
Mean34759.20922
Median Absolute Deviation (MAD)14576.68319
Skewness-0.05185456205
Sum1.295951581e+11
Variance296302478
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1.00000e+00 -5.00000e-01 4.50000e+00 1.15500e+02 1.16500e+02 ... 6.71145e+04 6.71155e+04 6.71165e+04 6.71175e+04 6.71400e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
44034 135116 3.6%
 
27530 122332 3.3%
 
44043 85570 2.3%
 
28050 81507 2.2%
 
27539 61276 1.6%
 
27538 56785 1.5%
 
44041 49926 1.3%
 
31327 39592 1.1%
 
29409 34553 0.9%
 
44047 31468 0.8%
 
Other values (67132) 3030244 81.3%
 
ValueCountFrequency (%) 
-1 822 < 0.1%
 
0 1 < 0.1%
 
1 6 < 0.1%
 
2 8 < 0.1%
 
3 1 < 0.1%
 
ValueCountFrequency (%) 
67140 3 < 0.1%
 
67139 2 < 0.1%
 
67138 1 < 0.1%
 
67137 1 < 0.1%
 
67136 10 < 0.1%
 

Ct
Real number (ℝ)

Distinct count135
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.84246865050106
Minimum-1
Maximum133
Zeros4
Zeros (%)< 0.1%
Memory size7.1 MiB

Quantile statistics

Minimum-1
5-th percentile37
Q142
median42
Q342
95-th percentile55
Maximum133
Range134
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.516253482
Coefficient of variation (CV)0.1942466687
Kurtosis22.95968419
Mean43.84246865
Median Absolute Deviation (MAD)4.400603515
Skewness2.337121156
Sum163460901
Variance72.52657338
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ -1. 3.5 4.5 5.5 6.5 ... 127.5 128.5 130.5 132.5 133. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
42 2640623 70.8%
 
48 279634 7.5%
 
55 135017 3.6%
 
37 107751 2.9%
 
52 104070 2.8%
 
58 73324 2.0%
 
43 70012 1.9%
 
13 66544 1.8%
 
50 58738 1.6%
 
56 40783 1.1%
 
Other values (125) 151873 4.1%
 
ValueCountFrequency (%) 
-1 1 < 0.1%
 
0 4 < 0.1%
 
1 14 < 0.1%
 
2 1 < 0.1%
 
3 2 < 0.1%
 
ValueCountFrequency (%) 
133 165 < 0.1%
 
132 1 < 0.1%
 
131 1 < 0.1%
 
130 12 < 0.1%
 
129 3 < 0.1%
 
Distinct count21410
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size28.4 MiB
Minimum1753-01-01 00:00:00
Maximum2030-12-22 00:00:00
Histogram

Km
Real number (ℝ≥0)

ZEROS
Distinct count495609
Unique (%)13.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean158319.9791
Minimum0
Maximum9944330
Zeros286977
Zeros (%)7.7%
Memory size28.4 MiB

Quantile statistics

Minimum0
5-th percentile0
Q160023
median145722
Q3221034
95-th percentile360322
Maximum9944330
Range9944330
Interquartile range (IQR)161011

Descriptive statistics

Standard deviation142909.9388
Coefficient of variation (CV)0.9026652205
Kurtosis117.5483949
Mean158319.9791
Median Absolute Deviation (MAD)96955.17595
Skewness4.893037318
Sum5.90275302e+11
Variance2.042325061e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0000000e+00 5.0000000e-01 1.5000000e+00 2.5000000e+00 3.5000000e+00 ... 2.2623180e+06 3.0509345e+06 3.6466625e+06 5.1451450e+06 9.9443300e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 286977 7.7%
 
1 2032 0.1%
 
3 1029 < 0.1%
 
7 841 < 0.1%
 
4 829 < 0.1%
 
8 794 < 0.1%
 
6 773 < 0.1%
 
10 769 < 0.1%
 
9 757 < 0.1%
 
11 717 < 0.1%
 
Other values (495599) 3432851 92.1%
 
ValueCountFrequency (%) 
0 286977 7.7%
 
1 2032 0.1%
 
2 557 < 0.1%
 
3 1029 < 0.1%
 
4 829 < 0.1%
 
ValueCountFrequency (%) 
9944330 1 < 0.1%
 
9633091 1 < 0.1%
 
9584128 1 < 0.1%
 
9455994 1 < 0.1%
 
9445862 1 < 0.1%
 

ZavA
Real number (ℝ≥0)

ZEROS
Distinct count35
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.001050862
Minimum0
Maximum44
Zeros1685149
Zeros (%)45.2%
Memory size28.4 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile7
Maximum44
Range44
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.571117076
Coefficient of variation (CV)1.28488342
Kurtosis2.998184476
Mean2.001050862
Median Absolute Deviation (MAD)2.037193931
Skewness1.546167723
Sum7460656
Variance6.610643017
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 22.5 24.5 26.5 32. 44. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 1685149 45.2%
 
1 424795 11.4%
 
2 375968 10.1%
 
3 332738 8.9%
 
4 283926 7.6%
 
5 235610 6.3%
 
6 143948 3.9%
 
7 93675 2.5%
 
8 58921 1.6%
 
9 35272 0.9%
 
Other values (25) 58367 1.6%
 
ValueCountFrequency (%) 
0 1685149 45.2%
 
1 424795 11.4%
 
2 375968 10.1%
 
3 332738 8.9%
 
4 283926 7.6%
 
ValueCountFrequency (%) 
44 1 < 0.1%
 
40 2 < 0.1%
 
33 1 < 0.1%
 
31 5 < 0.1%
 
30 2 < 0.1%
 

ZavB
Real number (ℝ≥0)

ZEROS
Distinct count32
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1660004683
Minimum0
Maximum37
Zeros3481154
Zeros (%)93.4%
Memory size28.4 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum37
Range37
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8030010426
Coefficient of variation (CV)4.837342031
Kurtosis74.10206896
Mean0.1660004683
Median Absolute Deviation (MAD)0.3099871253
Skewness7.258377676
Sum618911
Variance0.6448106743
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 16.5 19.5 21.5 27.5 37. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3481154 93.4%
 
1 101735 2.7%
 
2 54733 1.5%
 
3 36185 1.0%
 
4 22482 0.6%
 
5 13134 0.4%
 
6 7503 0.2%
 
7 4466 0.1%
 
8 2665 0.1%
 
9 1652 < 0.1%
 
Other values (22) 2660 0.1%
 
ValueCountFrequency (%) 
0 3481154 93.4%
 
1 101735 2.7%
 
2 54733 1.5%
 
3 36185 1.0%
 
4 22482 0.6%
 
ValueCountFrequency (%) 
37 1 < 0.1%
 
32 1 < 0.1%
 
31 1 < 0.1%
 
28 1 < 0.1%
 
27 2 < 0.1%
 

ZavC
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count15
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.009282074816
Minimum0
Maximum23
Zeros3705130
Zeros (%)99.4%
Memory size28.4 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum23
Range23
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1350141733
Coefficient of variation (CV)14.54568897
Kurtosis831.5233199
Mean0.009282074816
Median Absolute Deviation (MAD)0.01844843891
Skewness21.48757874
Sum34607
Variance0.01822882699
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 5.5 6.5 8.5 11.5 23. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3705130 99.4%
 
1 15291 0.4%
 
2 5600 0.2%
 
3 1688 < 0.1%
 
4 440 < 0.1%
 
5 133 < 0.1%
 
6 51 < 0.1%
 
7 14 < 0.1%
 
8 11 < 0.1%
 
11 4 < 0.1%
 
Other values (5) 7 < 0.1%
 
ValueCountFrequency (%) 
0 3705130 99.4%
 
1 15291 0.4%
 
2 5600 0.2%
 
3 1688 < 0.1%
 
4 440 < 0.1%
 
ValueCountFrequency (%) 
23 1 < 0.1%
 
14 2 < 0.1%
 
12 1 < 0.1%
 
11 4 < 0.1%
 
10 1 < 0.1%
 

VyslSTK
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.6 MiB
2
3474003
0
 
228059
1
 
26259
-1
 
48
ValueCountFrequency (%) 
2 3474003 93.2%
 
0 228059 6.1%
 
1 26259 0.7%
 
-1 48 < 0.1%
 

Length

Max length2
Mean length1.000012874
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 75.0%
 
Dash_Punctuation 1 25.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

VyslEmise
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.6 MiB
3
2397529
0
1323438
2
 
6484
1
 
918
ValueCountFrequency (%) 
3 2397529 64.3%
 
0 1323438 35.5%
 
2 6484 0.2%
 
1 918 < 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

STKDrTPVINDatKontTypMotTZnDrVozObchOznTypCtDatPrvRegKmZavAZavBZavCVyslSTKVyslEmise
031222JMZBLA2G6012585042018-01-02 11:03:12.833461943423205324422011-02-108481800020
13205241504172018-01-02 11:06:07.617354763744726295131989-01-013882800020
231142VF3MJAHXHGS2801682018-01-02 11:15:08.083218904173205621422017-01-093922700020
33618246998452018-01-02 11:19:22.96726564930201668421979-06-043895100020
437482WF0SXXGCDSAU067302018-01-02 11:30:25.4204104117712029420422010-06-2925419400020
538462JTJBC11A4024434272018-01-02 11:26:50.967818431522050282422012-09-2413025800020
633075W0922S235HNZ180702018-01-02 13:15:50.550-161032616527562017-07-27000020
737555TMBRD75L8A60126282018-01-02 12:11:56.7702669049302065317422009-12-2821893300023
831242WV2ZZZ7HZ9H0795902018-01-02 11:57:15.0202557258462020363422008-12-0823586500020
9371013WV2ZZZ7HZFH0623772018-01-02 11:54:17.9272644758462059044422015-01-195955700023

Last rows

STKDrTPVINDatKontTypMotTZnDrVozObchOznTypCtDatPrvRegKmZavAZavBZavCVyslSTKVyslEmise
3728359360313VF1BG0G06289957062018-03-14 14:58:12.5073867845152036839422003-12-2227912440023
3728360363313TMBDH25J7830088112018-03-14 15:08:20.8902579249302027530422008-03-127496110023
3728361383913TMBHG41U6Y23967962018-03-14 15:44:41.6032175149302044065422000-07-0316437710023
372836236372JMB0NV240SJ0104672018-03-14 15:48:07.1601160935832045096421998-05-1515905511000
3728363311413JN1CBAN16U00048002018-03-14 16:28:35.9635454938342015210422001-06-1223511520023
3728364374613VF37J9HXCAJ7237692018-03-15 06:55:47.5932047741732045238422010-08-2418531830023
372836537345XLRAE45CF0L2555482018-03-15 08:28:13.880269031197914801502004-03-2238191520023
3728366312813WDF639603130081732018-03-15 08:10:16.180154563494961376482004-01-0731454360023
3728367350613WBAAT91090KS226342018-03-15 11:22:12.2706776652206039422004-10-1826175760023
3728368310512TMBJB16Y8233523122018-03-15 08:29:56.9672262149302027530422002-01-246564850023